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This article considers estimation of constant and time-varying 
coefficients in nonlinear ordinary differential equation (ODE) models 
where analytic closed-form solutions are not available. The numeri- 
cal solution-based nonlinear least squares (NLS) estimator is investi- 
gated in this study. A numerical algorithm such as the Runge-Kutta 
method is used to approximate the ODE solution. The asymptotic 
properties are established for the proposed estimators considering 
both numerical error and measurement error. The B-spline is used 
to approximate the time-varying coefficients, and the corresponding 
asymptotic theories in this case are investigated under the framework 
of the sieve approach. Our results show that if the maximum step size 
of the p-order numerical algorithm goes to zero at a rate faster than 
f j- 1 /CP A4 ) ) the numerical error is negligible compared to the measure- 
ment error. This result provides a theoretical guidance in selection of 
the step size for numerical evaluations of ODEs. Moreover, we have 
shown that the numerical solution-based NLS estimator and the sieve 
NLS estimator are strongly consistent. The sieve estimator of con- 
stant parameters is asymptotically normal with the same asymptotic 
co-variance as that of the case where the true ODE solution is ex- 
actly known, while the estimator of the time- varying parameter has 
the optimal convergence rate under some regularity conditions. The 
theoretical results are also developed for the case when the step size 
of the ODE numerical solver does not go to zero fast enough or the 
numerical error is comparable to the measurement error. We illus- 
trate our approach with both simulation studies and clinical data on 
HIV viral dynamics. 
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1. Introduction. Ordinary differential equations (ODE) are widely used 
to model dynamic processes in many scientific fields such as engineering, 
physics, econometrics and biomedical sciences. In particular, new biotech- 
nologies allow scientists to use ODE models to more accurately describe 
biological processes such as genetic regulatory networks, tumor cell kinetics, 
epidemics and viral dynamics of infectious diseases [Chen, He and Church 
(1999), Jansson and Revesz (1975), Michelson and Leith (1997), Daley and 
Gani (1999), Anderson and May (1991), Brookmeyer and Gail (1994), Nowak 
and May (2000)]. The mathematical modeling approach has made a great 
impact on these scientific fields over the past decades. For instance, ODE 
models have been used to quantify HIV viral dynamics which resulted in 
important scientific findings [Ho et al. (1995), Wei et al. (1995), Perelson et 
al. (1996, 1997)]. Comprehensive reviews of the application of ODE models 
in HIV dynamics can be found in Perelson and Nelson (1999), Nowak and 
May (2000), Tan and Wu (2005) and Wu (2005). 

Although differential equation models have been widely used in scientific 
research, very little statistical research has been dedicated to parameter esti- 
mation and inference for differential equation models. The existing statistical 
methods for ODE models include the nonlinear least squares method [Bard 

(1974) , van Domselaar and Hemker (1975), Benson (1979), Li, Osborne and 
Pravan (2005)], the smoothing-based techniques [Swartz and Bremermann 

(1975) , Varah (1982), Chen and Wu (2008), Liang and Wu (2008), Brunei 
(2008)], the principal differential analysis (PDA) [Ramsay (1996), Heckman 
and Ramsay (2000), Poyton et al. (2006), Ramsay et al. (2007), Varziri et 
al. (2008)] and the Bayesian approaches [Putter et al. (2002), Huang, Liu 
and Wu (2006), Donnet and Samson (2007)]. However, very few of these 
publications rigorously address the theoretical issues and study the asymp- 
totic properties of the proposed estimators when both measurement error 
and numerical error are significant. In this paper, we intend to investigate 
statistical estimation methods for both constant and time-varying parame- 
ters in ODE models and study the asymptotic properties of the proposed 
estimators under the framework of the sieve approach. 

Denote a general set of ODE models containing only constant parameters 

as 



and denote a general set of ODE models with both constant and time- varying 
parameters as 



(1.1) 



< 



' dX(t) 
dt 



F{t,X(t),/3} 



Vie [to, T] 



„ X(io) — Xq, 



(1.2) 




F{i,X(i),/3,7?(i)} 



Vte[«o,T] 



X(t ) = X 
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where X(i) = {X\(t), . . . ,Xjc(t)} is a K-dimensional state variable vector, 
(3 is a d-dimensional vector of unknown constant parameters with true value 
j3 , rj(t) is an unknown time- varying parameter with true value 770 (i) (here 
we only consider a single time- varying parameter, the proposed methodology 
can be extended to include multiple time-varying parameters although it is 
tedious and cumbersome in notation), F(-) = {iq(-), . . . ,Fk(-)} t is a vector 
of differentiable functions whose forms are known and X(io) = Xo is the 
initial value. Equations (1.1) and (1.2) are called state equations. Obviously, 
equation (1.1) is a special case of (1.2). The function F(i,X,/3) in (1.1) or 
F(i, X, /3, 77) in (1.2) is assumed to fulfil the Lipschitz assumption to X [with 
the Lipschitz constant independent of the unknown parameters j3 and ry(-)] 
ensuring existence and uniqueness of the solutions to (1.1) and (1.2) [see 
Hairer, N0rsett and Wanner (1993) and Mattheij and Molenaar (2002)]. Let 
X(t,/3) and X(t,/3, rj(t)) denote the true solutions to (1.1) and (1.2) for given 
/3 and rj(-), respectively. We usually use notation X(i) to denote X(i,/3 ) 
or X(t, (3 , r/o (i)) in this article. Our objective is to estimate the unknown 
parameters (3 and rj(-) based on the measurements of the state variables, 
X(£) or their functions. 

If a closed-form solution to (1.1) or (1.2) is available, the standard statis- 
tical approaches for nonlinear regression or time-varying coefficient regres- 
sion models can be used to estimate unknown parameters. In practice, (1.1) 
and (1.2) usually do not have closed-form solutions for a nonlinear F. In this 
case, numerical methods such as the Runge-Kutta algorithm [Runge (1895), 
Kutta (1901)] have to be used to approximate the solution of the ODEs for a 
given set of parameter values and initial conditions. Consequently, the non- 
linear least squares (NLS) principle (minimizing the residual sum of squares 
of the differences between the experimental observations and numerical so- 
lutions) can be used to obtain the estimates of the unknown parameters. 
The NLS method for (1.1) was first described by mathematicians in 1970s 
[Bard (1974), van Domselaar and Hemker (1975), Benson (1979)]. The NLS 
method was also widely used to estimate the unknown parameters in ODE 
models in the fields of mathematics, computer science and control engineer- 
ing. In the 1990s, the NLS method was extended to estimate time-varying 
parameters in (1.2). For example, the NLS method with spline approxima- 
tion to time-varying parameters has been successfully applied to pharma- 
cokinetic [Li et al. (2002)], physiologic [Thomaseth et al. (1996)] and HIV 
studies [Adams (2005)]. 

Though the NLS was the earliest and the most popular method devel- 
oped for estimating the parameters in ODE models, so far the proposed 
NLS estimators and their asymptotic properties for ODE models have not 
been systematically studied, in particular, for time-varying parameter esti- 
mates. The influence of the numerical approximation error of ODEs on the 
asymptotic properties has not been analyzed. All existing studies took the 
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numerical solution as the true solution and did not consider the difference be- 
tween them. The difficulty is due to the co-existence of both measurement 
error and numerical error, and the standard theories of the NLS method 
[Jennerich (1969), Malinvaud (1970), Wu (1981), Delgado (1992)] cannot be 
directly applied. In this article, we intend to fill this gap. 

The rest of the article is organized as follows. In Section 2, we discuss the 
identifiability problem of ODE models. Then we introduce the numerical 
solution-based NLS estimators for (1.1) and (1.2), and study their asymp- 
totic properties in Sections 3 and 4, respectively. The asymptotic properties 
of the proposed estimators, including strong consistency, rate of convergence 
and asymptotic normalities, are established using the tools of empirical pro- 
cesses [Pollard (1984, 1990), Pakes and Pollard (1989), van der Vaart and 
Wellner (1996), Ma and Kosorok (2005), Wellner and Zhang (2007)] and the 
sieve methods [Grenander (1981), Shen and Wong (1994), Huang (1996) and 
Shen (1997)]. We perform simulation studies to investigate the finite-sample 
performance of the proposed estimation methods in Section 5. In this sec- 
tion, we also apply the proposed approaches to a set of ODE models for HIV 
dynamics. We provide a summary and discussion for the proposed methods 
in Section 6. Finally, the proofs of all the theoretical results are given in the 
Appendix. 

2. Identifiability of ODE models. Identifiability of ODE models is a criti- 
cal question to answer before parameter estimation. To verify the uniqueness 
of parameter estimates for given system inputs and outputs, both analyt- 
ical and numerical techniques have been developed for ODE models since 
1950s. Before jumping into technical details, two commonly used definitions 
of identifiability are given as follows [Bellman and Astrom (1970), Cobelli, 
Lepschy and Jacur (1979), Walter (1987), Ljung and Glad (1994), Audoly 
et al. (2001), Jeffrey and Xia (2005)]. 

Definition 1. Globally identifiable: a system structure is said to be 
globally identifiable if for any two parameter vectors /3 1 and 2 m the pa- 
rameter space B, X(t,/3 1 ) = X(i,/3 2 ) can be satisfied for all t if and only if 

01=02' 

However, global identifiability is a strong condition to satisfy and usu- 
ally difficult to verify in practice. Therefore, the definition of at-a-point 
identifiability was introduced by Ljung and Glad (1994) and Quaiser and 
Monnigmann (2009) as follows. 

Definition 2. At-a-point identifiable: a system structure is said to 
be locally (or globally) identifiable at a point (3^ if for any f3 within an 
open neighborhood of 0* (or within the entire parameter space), X(i,/3) = 
X(i,/3„,) can be satisfied for all t if and only if (3 = 0*. 



SIEVE ESTIMATION IN ORDINARY DIFFERENTIAL EQUATION MODELS 5 



A number of methods have been proposed for identifiability analysis of 
ODE models, including structural [Bellman and Astrom (1970), Ljung and 
Glad (1994), Xia and Moog (2003)], practical [e.g., Rodriguez-Fernandez, 
Egea and Banga (2006), Miao et al. (2008)] and sensitivity-based [e.g., Jolliffe 
(1972), Quaiser and Monnigmann (2009)] approaches. Due to the limited 
space, we may not be able to provide an exhaustive list of publications on 
identifiability of ODE models. In this article, the structural identifiability 
analysis techniques are of particular interest mainly due to the theoretical 
completeness. 

Various structural identifiability approaches have been proposed, such as 
power series expansion [Pohjanpalo (1978)], similarity transformation [Va- 
jda et al. (1989), Chappel and Godfrey (1992)] and implicit function theo- 
rem method [Xia (2003), Xia and Moog (2003), Miao et al. (2008), Wu et 
al. (2008)]. Particularly, Ollivier (1990) and Ljung and Glad (1994) intro- 
duced another approach in the framework of differential algebra [Ritt (1950), 
Kolchin (1973)]. The differential algebra approach is suitable to general non- 
linear dynamic systems, and it has been successfully applied to nonlinear 
differential equation models, including models with time-varying parame- 
ters [Audoly et al. (2001)]. In this article, the differential algebra approach 
is employed to verify the identifiability of ODE models with both constant 
and time- varying parameters. 

For most structural identifiability analysis techniques such as the implicit 
function theorem method and the differential algebra approach, a key step is 
the elimination of latent variables via taking derivatives and algebraic oper- 
ations, which makes such techniques suitable for multivariate ODE models 
with partially observed state variables. After all unobserved state variables 
are eliminated, equations involving only given inputs, measured outputs and 
unknown parameters can be obtained. If we consider the parameters as un- 
knowns, it is easy to verify that the identifiability of unknown parameters 
is determined by the number of roots of these equations. 

For illustration purposes, we consider a classical HIV dynamic model with 
both constant and time- varying parameters [Nowak and May (2000), Huang, 
Rosenkranz and Wu (2003), Wu et al. (2005)] as an example: 

jTuit) = A - P Tu(t) - n(t)Tu(t)V(t), 

( jT I {t) = V {t)T u {t)V{t)-5T I {t), 

jV(t) = N6T I (t)-cV(t), 

where Tjj is the concentration of uninfected target CD4+ T cells, Tj the 
concentration of infected CD4+ T cells, V{t) the viral load, A the prolif- 
eration rate of uninfected CD4+ T cells, p the death rate of uninfected 
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CD4+ T cells, rj(t) the time-varying infection rate depending on antiviral 
drug efficacy, 5 the death rate of infected cells, c the clearance rate of free 
virions, N the number of virions produced by a single infected cell on av- 
erage. This model will also be used in our numerical studies in Section 5. 
For notational simplicity, let x\, X2 and x% denote Tjj, Tj and V, and let 
yi = Tjj + Ti = x\ + X2 and 2/2 = V = X3 denote the measurable outputs, 
respectively. Then (2.1) can be re- written as 

{x' x = A - pxi - r](t)xix 3 , 
x 2 = V(t) x l x 3 ~ & x 2i 
x f 3 = N5x2 — CX3, 

where x' x ,x 2 and x' 3 denote the derivatives of x\,X2 and X3, respectively. 
We adopt the following ranking for variable elimination [Ljung and Glad 
(1994)], 

(2.3) r]~<y2~<yi~<l3^ x 3^ x 2^ x l, 

where (3= (A, p, N,5,c) T is the vector of constant unknown parameters. By 
taking the higher order derivatives on both sides of (2.2) and using some 
algebra elimination techniques, we can eliminate x\, X2 and X3 from (2.2) 
using the ranking (2.3) to obtain 

(2.4) yf ] + (p + 5)y[ + 5py 1 -SX + r,(t)y 2 (y[ + 5y x - A) = 0, 

(2.5) yP + {5 + c)y' 2 + 5cy 2 - r?(t)y 2 (iV^i cy 2 ) = 0, 
(2) (2) 

where y\ and y 2 denote the second-order derivative of yi(t) and y2{t), 
respectively. Therefore, rj(t) can be expressed in terms of measurable system 
outputs and other constant unknown parameters either from (2.4) as 

(2.6) V (t) = vP + b+fM + Sm-s* 

-y 2 (yi + <5yi-A) 

or from (2.5) as 

(2 7) n(t) - V? + (S + c)y' 2 + 5cy2 

{ ' V[ ) ~ y2(NS yi -y' 2 -ay 2 ) ' 

Thus, rj{t) is identifiable if all the constant unknown parameters are identi- 
fiable. To verify the identifiability of all unknown parameters 6 = ((3 T ,rj) T , 
equations (2.6) and (2.7) can be combined to obtain 

yi 2/22/2 - y'my^ - 5yiy2y { 2 2) + ^yf 1 - {5 + c)y[y 2 y 2 

+ {p5 + p + 5-5 2 - 8c)yiy2y 2 + cy 2 y 2 

(2.8) 

+ pcy'm 2 + {pSc - 5 2 c)yiy 2 2 - N5y 1 yf ) y 2 

+ cyf ] y 2 2 - N5(p + 5)y iy ' iy 2 - N5 2 py 1 2 y 2 + N5 2 \ yi y2 = 0. 
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The equation above only involves measurable system outputs [(Tjj + T/), V 
and their derivatives] and constant unknown parameters. We assume that 
the derivatives of (Tjj + Tj) and V exist and are continuous up to order 
2. Although the derivatives of (Tjj + Tj) and V are usually not directly 
measured in experiments, for theoretical identifiability analysis, they are 
known once (Tjj + Tj) and V are measured (e.g., via numerical evaluation). 
Finally, it can be verified that (2.8) is of order and of degree > 1 in 6, so 
(2.8) satisfies the sufficient conditions given in Ljung and Glad (1994) and 
(3 = (A, p, N, S, c) T is thus at-a-point identifiable at the true parameter point. 
Therefore, r/(i) is also at-a-point identifiable at the true parameter point. 
For more detailed techniques for identifiability analysis of ODE models, we 
refer readers to the references listed above. 

3. ODE models with constant parameters. Throughout this article, we 
let ||a|| be the Euclidean norm (or L2 norm) of a vector (or a matrix) a; 
|| A||oo = maxi<j< m ^j=il a «jl be the supremum norm of an m x n matrix 
A, where a,j is the (i, j)th element of A; A® 2 = AA T for a matrix A; 
C r [a, b] be the class of functions with r-order continuous derivative on the 
interval [a, b]; \\f\\oo = sup 4 |/(i)| be the supremum norm of a function /; 
and x Ay denotes min(x,y). Moreover, for a random vector Z ~ P, where 
P is a probability measure, we let ||/(Z)|| 2 = ||/||p,2 = (/ fdP) 1 / 2 be the 
L2(-P)-norm of a function /. 

3.1. Measurement model and estimator. In this section, we consider ODE 
models with constant parameters, that is, equation (1.1), over the time 
range of interest I = [to,T] (—00 < to < T < +00), where the initial value 
Xo = X(io) is assumed to be known in this article. In reality, X(t) cannot 
be measured exactly and directly; instead, its surrogate Y(t) can be mea- 
sured. For simplicity, here we assume an additive measurement error model 
to describe the relationship between X(tj) and the surrogate Y(£j), 

(3.1) Y(t l ) = X(t l ) + e(t i ), 

at random or fixed design time points ti,...,t n , where the measurement 
errors (e(t{), . . . , e(t n )) are independent with mean zero and a diagonal 
variance-covariance matrix S. Moreover, in the case of random design, as- 
sume that the measurement errors are independent of X(i). Equation (3.1) 
is called the observation or measurement equation. 

If (1.1) does not have a closed- form solution, we need to resort to numer- 
ical techniques to obtain numerical solutions at discrete time points. In this 
article, we consider a general one-step numerical method. Let to = s o < s l < 
• • • < s m _i = T be grid points on the interval /, hj = Sj — Sj-i be the step 
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size and h = maxi<j< m _i hj be the maximum step size, and and X^ +1 
be the numerical approximations to the true solutions X(s,) and X(s,-_|_i), 
respectively, which can be typically written as 

(3.2) Xj +1 = ^ + fcfcfo.Xj, Xj +1 , h), 

where the specific form of $ depends on the numerical method. The common 
numerical methods include the Euler backward method, the trapezoidal rule, 
the r-stage Runge-Kutta algorithm (r is usually between 2 and 5), and so 
on. Among these algorithms, the 4-stage Runge-Kutta algorithm [Mattheij 
and Molenaar (2002), page 53, Hairer, N0rsett and Wanner (1993), page 134] 
has been well developed and widely used in practice. Therefore, we employ 
the 4-stage Runge-Kutta algorithm as an example in our numerical studies. 

Define e h = maxo<j< m _i||X(sj) — X^||, which is called the numerical error 
or the global discretization error [Hairer, N0rsett and Wanner (1993), page 
159, Mattheij and Molenaar (2002), page 57]. If e h = 0(/i p ), p is called the 
order of the numerical method. It is necessary to establish a relationship 
between the number of grid points m (or the maximum step size h) and 
the sample size of measurements n since the asymptotic properties of the 
proposed estimators are related to both numerical error and measurement 
error. To our best knowledge, this is the first attempt to establish such as a 
relationship. 

Following Mattheij and Molenaar [(2002), page 58] the interpolation tech- 
nique is commonly used if the measurement points (U,i = 1,2, ... ,n) are 
not coincident with the grid points (sj,j = 1,2, . . . ,m — 1) of the numerical 
method, and the cubic Hermite interpolation is often adopted. Let X(£,/3) 
denote the interpolated numerical solution of X(t,/3) obtained from the nu- 
merical method for given j3, and then (3.1) can be approximately rewritten 
as Y(i) ~ X(i,/3 ) + e(t). The simple numerical solution-based NLS estima- 
tor (3 n of (3 Q minimizes 

n K 

(3.3) E^) = EEK(^) - 

i=i j=i 

Note that if the data are correlated or the measurement variances are het- 
erogeneous, the weighted NLS can be used. The theoretical results can be 
extended to the weighted NLS. Also note that we can easily obtain the 
estimator X(t) = ~K(t,f3 n ) for X(i). 

To minimize the NLS objective function (3.3), the standard gradient op- 
timization methods may fail due to the complicated nonlinear ODE model 
and the NLS objective function may have multiple local minima or may be 
ill-behaved [Englezos and Kalogerakis (2001)]. Fortunately, various global 
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optimization methods are available to more reliably solve the parameter es- 
timation problem for ODE models, although the global optimization meth- 
ods are very computationally intensive. Moles, Banga and Keller (2004) 
compared the performance and computational cost of seven global opti- 
mization methods, including the differential evolution method [Storn and 
Price (1997)]. Their results suggest that the differential evolution method 
outperforms the other six methods with a reasonable computational cost. 
Improved performance can be achieved using a hybrid method combining 
gradient methods and global optimization methods. A hybrid method based 
on the scatter search and sequential quadratic programming (SQP) has been 
proposed by Rodriguez- Fernandez, Egea and Banga (2006), who showed that 
the hybrid scatter search method is much faster than the differential evolu- 
tion method for a simple HIV ODE model. In addition, Miao et al. (2008) 
also suggested that global optimization methods should be used for general 
nonlinear ODE models. Here we combine the differential evolution, the scat- 
ter search method and the SQP local optimization technique to implement 
our NLS minimization. 



3.2. Asymptotic properties. In this section, we study the asymptotic 
properties of the proposed numerical solution-based NLS estimator when 
both measurement error and numerical error are considered. First we make 
the following assumptions: 

Al. (3 G B, where B is a compact subset of lZ d with a finite diameter Rp. 
A2. Vt = {X(i,/3) :t G 1,(3 G £>} is a closed and bounded convex subset of 
K K . 

A3. There exist two constants — oo < ci < C2 < +oo such that c\ < Y(£) < 
C2 for all t £ I. 

A4. All partial derivatives of F(i, X, (3) up to order p with respect to t and 

X exist and are continuous. 
A5. The numerical method for solving ODEs is of order p. 
A6. For any /3 G B, E t [X(t,(3) - X(i,/3 )] 2 = if and only if (3 = (3 . 
A7. The first and second partial derivatives, dX Qp^ and ^q^q^t ; exist and 

are continuous and uniformly bounded for all t G / and f3 G B. 
A8. For the ODE numerical solution X(i,/3), the first and second partial 

derivatives, 9X Qp® and ^q^q^t , exist and are continuous and uni- 
formly bounded for all t G / and j3 G B. 
A9. Let < C3 < C4 < 00 be two constants. For random design points, 
ti,...,t n are i.i.d. The joint density function (p(t,y) of (i,Y) satis- 
fies c 3 < <p(t,y) < c 4 for all (t,y) G [t ,T] x [c 1 ,c 2 j. 
A10. The true parameter /3 is an interior point of B. 
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All. is an interior point of B, where = argminggg Eo[Y(t) — X(i,/3)] r x 
[Y(t) — X(i, (3)] and £?o is the expectation with respect to Pp , the joint 
probability distribution of (t, Y(t)) at true value (3 . 

A12. ^i = {Et{^^)y l E t {^^){E t (^^)}^ is positive defi- 

nite, where Et[g(t)] is expectation of function g(t) with respect to t. 
A13. V 1 = {^(g^)}- 1 i5; (g[Y(t)-X(t^r 2 ^){^(g^)}- 1 is 
positive definite. 

Assumptions A1-A4 are general requirements for existence of numeri- 
cal solutions of ODE models. Assumption A5 from Mattheij and Molenaar 
(2002, pages 55 and 56) defines the precision of the numerical algorithm. 
For example, the Euler backward method, the trapezoidal rule, the 4-stage 
and 5-stage Runge-Kutta are of order 1, 2, 4 and 5, respectively. Theorem 
2.13 in Hairer, N0rsett and Wanner [(1993), page 153] provides sufficient 
and necessary conditions for the numerical method to be of order p. The- 
orems 3.1 and 3.4 in Hairer, N0rsett and Wanner [(1993), pages 157 and 
160] give the magnitude of the numerical error of the numerical algorithms. 
Assumption A6 is required for identifiability and imposed for consistency. 
From Section 2, we know that the HIV model (2.1) is at-a-point identifiable 
at the true value j3 Q . This result and assumption A9 are sufficient condi- 
tions for assumption A6 to be satisfied. Assumptions A7-A9 are needed for 
consistency. Assumptions A10-A13 are needed for the proof of asymptotic 
normality in Theorem 3.2. 

Theorem 3.1. Assume that there exists a A > such that h = 0(n~ x ), 
then under assumptions Al— A10, we have n — (3 — > 0, almost surely under 

Theorem 3.2. (i) For h = 0(n~ x ) with A > l/(pA 4) where p is the 
order of the numerical method (3.2), under assumptions A1-A10 and A12, 

we have that n 1 / 2 (0 n - (3 ) A N(0, Vi). 

(ii) For h = 0(n~ x ) with 0<A< A 4), under assumptions A1-A9, 

All and A13, we have that n l / 2 {0 n - 0) 4- N(0, Vi) with \\0 - (3 \\ = 
Q( h (pM)/2} = („-A(pA4)/2) and || Vj - T4H = 0(/i(p a4 )/ 2 ) = 0(n~ A (P A4 )/ 2 ). 

The detailed proofs of Theorems 3.1 and 3.2 are provided in the Appendix. 
The basic idea for the proofs is motivated by Pakes and Pollard (1989) in 
which a general central limit theorem is proved for a broad class of sim- 
ulation estimators, that is, the objective function of the estimator is too 
complicated to evaluate directly, and instead the Monte Carlo simulation 
is used to approximate the objective function to obtain the estimator. The 
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asymptotic properties of the simulation-based estimator are established us- 
ing a general central limit theorem under nonstandard conditions given in 
Huber (1967) and Pollard (1985), which are often called the Huber-Pollard 
Z-theorem [see Theorem 3.3.1 in van der Vaart and Wellner (1996)]. In this 
article, we use the same theorem to prove the asymptotic normality of the 
numerical solution-based NLS estimator for ODEs. Similarly, our objective 
function Hi in (3.3) cannot be directly evaluated; instead we have to ap- 
proximate it by solving (1.1) numerically. Thus, similar ideas in Pakes and 
Pollard (1989) can be borrowed to establish the asymptotic results of our 
estimator in Theorems 3.1 and 3.2. 

Remark 1. Theorems 3.1 and 3.2 can be extended to fixed design points 
U G [to,T] (i = l,...,n). Assume that there exists a distribution function 
Q(t) with corresponding density (p(t) such that, with Q n (t), the empiri- 
cal distribution of (t±, . . . ,t n ), sup fe [ f0) y]|Q n (t) — Q(t)\ = O p (n~ 1 / 2 ) and <p(t) 
is bounded away from zero and has continuous second-order derivative on 
[to,? 1 ]. Define E t [g(t)} be the integral g(t) dQ{t) for function g(t). Simi- 
larly we can prove Theorems 3.1 and 3.2 for the fixed design if we replace 
assumption A9 by above assumption. 

Remark 2. From the proof of Theorem 3.2 in the Appendix, we still 
have \\f3-f3 Q \\ =0{h^/ 2 ) and \\V X - V x \\ = 0(^ M )/ 2 ) for a fixed constant 
h, which is independent of the sample size re. This suggests that, if the 
maximum step size h of the numerical algorithm for solving ODEs is a 
fixed constant, the numerical solution-based NLS estimator is not consistent. 
Instead the asymptotic bias is in the order of h^ M V 2 . 

Notice that our asymptotic results provide a theoretical foundation for the 
relationship between the numerical step size and sample size, that control 
numerical error and measurement error, respectively, for the widely-used 
NLS estimator based on the numerical solutions of the ODEs. Intuitively, 
the smaller the numerical step size is, better the estimator is. However, a 
smaller step size will increase the computational cost and this may become 
a serious problem when the ODE system is large and the computational cost 
is high. It is important to study the trade-off between the numerical error 
and measurement error when the computational cost needs to be taken into 
consideration. Our theoretical results show that, only when the numerical 
step size, which controls the numerical error and computational cost, goes 
to zero with a rate faster than a particular rate n~ 1 ^ pM \ the numerical 
solution-based NLS estimator converges to the true value of the parameters 
with the rate of root-re. In addition, the asymptotic variance of the NLS 
estimator is the one as if the true solution X(t) is exactly known. 
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The asymptotic variance-covariance matrix needs to be estimated in or- 
der to perform statistical inference for unknown parameters (3. There are 
some standard methods that can be used. The first approach is to use the 
observed pseudo-information matrix based on the NLS objective function 
(3.3). The observed pseudo-information matrix is defined as Zi(/3) = — . 

— 1/2 " 

The standard error of (3 n can then be approximated by X 1 (P n )/^/n. In 
practice, we have noted that the inverse of the observed pseudo-information 
matrix provides a reasonable approximation to the asymptotic variance- 
covariance matrix V\. Rodriguez-Fernandez, Egea and Banga (2006) also 
proposed this approach for parameter inference in ODE models. 

The second approach is the weighted bootstrap method [Ma and Kosorok 
(2005)]. Let Wi, i = l,...,n, denote n i.i.d. positive random weights with 
mean one [i£(W) = 1] and variance one [Var(W) = 1]. The weights, Wi are 
independent of {/3,t,Y(t)}. For (1.1), the weighted M-estimator satisfies 

n K 

Pi = argmin^^ WjYjiU) - X^/3)] 2 . 
i=i j=i 

From Corollary 2 and Theorem 2 in Ma and Kosorok (2005), given {ti, Y(ij)}, 
V™(fln ~ fin) an d V™(Pn ~ Po) have the same limiting distribution, then 
the weighted M-estimator /3° can be used for inference on (3 n . 

Note that the empirical bootstrap has been used for statistical inference 
for ODE models [Joshi, Seidel-Morgenstern and Kremling (2006)]. However, 
the asymptotic properties of the empirical bootstrap estimators are quite 
difficult to derive. This is why we propose to use the weighted bootstrap 
method instead of the empirical bootstrap approach. 

4. ODE models with both constant and time-varying parameters. 

4.1. Measurement model and estimator. In this section, we consider (1.2) 
with both constant and time-varying parameters, where the initial value 
Xo = X(to) is assumed to be known. Again, X(i) is not observed directly in 
practice; instead, we observe its surrogate Y(t) through (3.1). 

Let A be the following class of functions, 

(4.1) A= {?? G C»[t ,T] : |t/")(zi) - r ] ^\z 2 )\ < L\ Z1 - z 2 p}, 

where fi is a nonnegative integer, 7 £ (0,1], g = fi + 7 > 0.5, and L an 
unknown constant. The smoothness assumption is often used in nonpara- 
metric curve estimation. Usually, either g = 1 (i.e., [i = and 7 = 1) or 
g = 2 (i.e., /i = 1 and 7 = 1) should be satisfied in various situations. Denote 
= ((3 T ,T]) T . Then the parameter space is denoted by O = {0 : (3 € B, r\ 6 
A} = BxA. 
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In this article, we use the method of sieves to approximate 770 (i) on the 
support interval [to,T] of t. The basic idea of the sieve approach is to ap- 
proximate an infinite-dimensional parameter space Q by a series of finite- 
dimensional parameter spaces Q n , which depend on the sample size n, and 
then to estimate the parameter on the finite-dimensional spaces Q n instead 
of G. The concept of sieve was first proposed by Grenander (1981). Since 
then, the sieve method has been a powerful tool in the area of nonparametric 
and semiparametric statistics [Shen and Wong (1994), Huang (1996), van 
der Vaart and Wellner (1996), Section 3.4, Shen (1997), Huang and Rossini 
(1997), Huang (1999), He, Fung and Zhu (2002), Xue, Lam and Li (2004) 
and Huang, Zhang and Zhou (2007)]. 

Here we apply the sieve estimation method to (1.2) with a time- varying 
parameter. First, we approximate r] (t) by B-spline functions on the sup- 
port interval I of t. Let to = u o < u i < • • • < u q = T be a partition of the 
interval /, where q = 0(n v ) (0 < v < 0.5) is a positive integer such that 
ma,xi<j< q \uj — Uj—i\ = 0(n~ v ). Then we have N = q + l normalized B-spline 
basis functions of order I + 1 > g [see Huang (2003), page 1618] that form 
a basis for the linear spline space. We denote these basis functions in the 
forms of a vector ir(t) = (Bi(t), . . . , B]\[(t)) T with which rj(t) can be approx- 
imated by ir(t) T ot, where ex = (a\, . . . ,ctN) T G 1Z N is the spline coefficient 
vector with cxq corresponding to rjo(t). Such approximation is extensively 
used in nonparametric and semiparametric problems [Stone (1985), Shen 
and Wong (1994), Shen (1997), Huang (1999) and Huang (2003)]. The read- 
ers are referred to Schumaker [(1981), page 118] for more details about the 
construction of the basis functions. Regression spline approximation to a 
nonparametric function can always be expressed as a linear function of basis 
functions so that the problem of time- varying coefficients can be transformed 
into an estimation problem for a number of constant parameters. Thus the 
estimation methods and computational algorithms developed for (1.1) with 
constant coefficients in Section 3 can be employed for (1.2) with both con- 
stant and time- varying parameters. 

For any Oi G B x A (i = 1, 2), we define a distance 



where t n < rS 21 1 )/I 2/ '( 2 '+ 1 )] with a constant I' arbitrarily close to / [see Shen 
(1997), page 2560], then Q n = {0 : (5 £ B, rj G A n } = B x A n can be used 
as a sieve of G. In fact, for any 6 = (3 T ,rj) T G Q, by Corollary 6.21 in 
Schumaker (1981), there exists rj n G A n such that \\rj n — rj^oo = O p {n~ ve ). 



(4.2) 



d(0i,O 2 ) 



Pi -A2 II + \\m -mh- 



Denote set 
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Denote n = (/3 T ,r/ n ) T G 6 n , then d(0,0 n ) = O p (rT ve ). Equation (1.2) now 
becomes 

^«F{t,X(t),/3,,r(t) T a}- 

For this approximation model, let X(i, (3, 7t(t) T cx) be the numerical approx- 
imation of X(t,/3, ??(/;)) that can be obtained from the same numerical al- 
gorithm as described in Section 3. Equation (3.1) can be approximated by 
Y(i) X(i, (3, ■K{t) T olq) +s(t). The numerical solution-based sieve NLS es- 
timator n = (f3n,r) n ) T is defined as 

n K 

(4.3) n = arg inf H 2 (0) = arg inf V V [Yj (i 4 ) - Xj (U , (3, V (t) )] 2 . 
eee„ 6»ee„ 

When we substitute the sieve NLS estimators # n into the numerical approx- 
imation, we can obtain the estimator X(t) = ~K(t, /3 n ,f) n (t)). 

4.2. Asymptotic properties. The empirical objective function for the sieve 
NLS method proposed in Section 4.1 is a second-order loss function which is 
not a likelihood function. We cannot use the standard information calcula- 
tion of the maximum likelihood estimator (MLE) based on orthogonal pro- 
jections in semiparametric models [Bickel et al. (1993)], and the asymptotic 
normality theory for semiparametric MLEs obtained by Huang (1996, The- 
orem 6.1) does not apply to our case. Fortunately, Ma and Kosorok (2005) 
and Wellner and Zhang (2007) extended the Huang's asymptotic normality 
results to more general semiparametric M-estimators by using a so-called 
pseudo-information calculation. We are able to employ these new asymp- 
totic results to asymptotic properties of the proposed sieve NLS estimator, 
and the following additional assumptions are needed: 

Bl. The true time- varying parameter r/o(-) G A, where A is denoted in (4.1). 

B2. All partial derivatives of F up to order p with respect to t, X, and rj, 
respectively, exist and are continuous. 

B3. For any (3 G B and r, G A, E t [X(t, (3,i](t)) - X(t, (3 , Vo(t))] 2 = if and 
only if /3 =(3 and P{t : r)(t) = rj (t)} = 1. 

B4. The first and second partial Frechet-derivatives [van der Vaart and Well- 
ner (1996), page 373] in the norm d defined in (4.2), 8X( ^ /3 ' r?) , 9X{ ^ ' v) , 

9 dpdP^ ' 9 d/^?'^ anc ^ 9 ^dri^'^ ex ^ an d are continuous and uni- 
formly bounded for all t G /, (3 G B and rj £ A. 
B5. For the ODE numerical solution X(i,/3, 17), the first and second par- 
tial Frechet-derivatives in the norm d, dx^A 7 ?) , l9X g^ 3 ' ?? ^ , 9 ^ag§$ > 

- d^fy^ and 8 X g*2 3 '^ exist and are continuous and uniformly bounded 
for alH G I, (3 G B and ?? G A. 
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B6. For K > 2, V2 = Sj~ 1 S2(S^ 1 ) T is positive definite, where Si and S2 are 
denned in (A. 5) and (A. 6) in the Appendix, respectively. 

B7. v satisfies the restrictions 0.25/ g < v < 0.5 and v(2 + g) > 0.5, where g 
is the measure of smoothness of rj(t) defined in assumption (Bl). 

Theorem 4.1. Assume that there exists a A > such that h = 0(n~ A ) 
and under assumptions A1-A4, A9, A10 and B1-B5, we have d(0 n , 0$) — > 0, 
almost surely under Pg . 

Theorem 4.2. Assume that there exists a A> l/[2(p A 4)] such that 
h = 0(n~ x ) where p is the order of the numerical algorithm (3.2), and under 
assumptions A1-A4, A9, A10 and B1-B5, we have d(0 n ,0o) = O p (n~ vg + 

From Theorem 4.2, we know that \\/3 n - (3 \\ = O p (n~ ve + n"^ 1 ^/ 2 ) and 
||»7n(t) - y?o(*)||2 = O p {n~ ve + n-^- v ^ 2 ). If v = 1/(1 + 2g), the rate of con- 
vergence of fi n is n~ g /( 1+2g \ which is the same as the optimal rate of the 
standard nonparametric function estimation [Stone (1982)]. Theorem 4.3 
below states that the rate of weak convergence of j3 n achieves n -1 / 2 under 
some additional assumptions. 

Theorem 4.3. For the maximum step size h = 0(n~ x ) with A > l/(p A 
4), under assumptions A1-A4, A9, A10 and B1-B7, and K > 2, we have 

ny 2 n -{3 o )AN(O,V 2 ). 

Remark 3. For the case h = 0(n~ A ) with l/[2(p A 4)] < A < l/(p A 4), 
similar results to case (ii) in Theorem 3.2 can be obtained. 

For K = 1, Theorem 4.3 does not hold, since in this case the special 
perturbation direction a*(t) given in (A. 4) is §^/§^, which leads to both 
Si in (A. 5) and S2 in (A. 6) to be zero (see the proof of Theorem 4.3 in the 
Appendix). In this article, we consider one special case that we assume there 
exists an additive relationship between (3 and n(-) as follows: 

(4.4) *%Q. = F{ t,X(t),(J + T,(t)}, 

which is a special case of (1.2), then the function X(t) has the form of 
X(t, (3 + n(t)). In this case, we are able to establish similar asymptotic 
normality results under the identifiability constraint Etr/(t) = 0. Note that 
Schick (1986) studied a similar problem under a semiparametric regression 
model and used the same identifiability constraint for the unknown function 
rj(t) to establish the asymptotic normality for the constant parameters. We 
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follow a similar idea and use B-spline approximation for rj(t). We center the 
B-spline estimator of i](t) as follows: 

N - n N N . n 



t=i j=i 



which is subject to the constraints Y^A=iVn(,U) = 0. Under similar assump- 
tions, the strong consistency and the rate of weak convergence of the estima- 
tors, similar to those of Theorems 4.1 and 4.2, can be obtained. In particular, 
the asymptotic normality can be established as follows: 

Proposition 1. For (4-4) with K=l, when the maximum step size 
h = 0(n~ x ) with X > l/(p A 4), under assumptions A1-A4, A9, A10, Bl- 

B5, B7 and in addition E t [r](t)] = 0, we have n 1/2 (/3 n - (3 ) A N(0, V 3 ), 
where V 3 = ff§{^(^) 2 } -1 with £ = Po + »»(*)■ 

The proof of this proposition is different from that of Theorem 4.3 and is 
given in the Appendix. 

Remark 4. By combining Theorem 4.3 and Proposition 1, we can see 
that the proposed sieve NLS estimator is asymptotically normal with a con- 
vergence rate of yjn for K > 2 under assumption B6, but we are only able 
to prove the result for a special ODE model (4.4) for K = 1. This is because 
the asymptotic covariance Vi defined in B6 is always singular in the case of 
K = 1, and is only possibly nonsingular in the case of K > 2. Since V2 is 
always singular for K = 1 , we derive the asymptotic distribution for the spe- 
cial ODE model (4.4) using a different approach which results in Proposition 
1. 



Similar approaches proposed in Section 3 can be used to estimate the 
asymptotic variance-covariance matrix for (f3 n , f] n {t)). For the first approach, 
the observed pseudo-information matrix can be evaluated by replacing rj(t) 
with the spline approximation -ir T (t)cx, that is, to rewrite the objective func- 
tion £2(0) in the expression (4.3) as H2(/3,a). Then the observed pseudo- 
information matrix l2((3,a) can be defined as 



d0 2 

d 2 E, 



d 2 E 2 

d/3da 
d 2 E 2 



\ 



V 



da 2 I 



da 8(3 

— 1/2 A 

The standard error of [/3 n ,a n ) is approximately X 2 ((3 n ,a n )/y/n from 
which the standard error of n can be obtained. We also find that the 
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inverse of the observed pseudo-information matrix provides a reasonable 
approximation to V2 via our simulation studies in the next section. 

Similarly the weighted bootstrap method can also be used. For (1.2), the 
weighted M-estimators (/9°,Q! n ) satisfy 

n K 

0° n , o&) = arg min £ £ [1^) - X.j(U,P, n(tfa)] 2 . 
i=X j=l 

Based on Corollary 2 and Theorem 2 in Ma and Kosorok (2005), given 
{ti, y(ti)}, %/n{fin — P n ) and y/n(/3 n — /3 ) have the same limiting distribu- 
tion which can be used to justify the weighted bootstrap for inference on (3 n 
and f)n(t). 

5. Numerical studies. In this section, we consider the HIV dynamic 
model described in Section 2. Recall that in this system, Tjj(t), Tj(t) and 
V(t) are state variables and (A, p,5, N,c,r](t)) T are kinetic parameters. By 
introducing the time-varying infection rate r/(i) in this HIV dynamic model, 
the model can flexibly describe the long-term viral dynamics. In clinical stud- 
ies, only viral load, V{t) and total CD4+ T cell count, T{t) = Tjj(t) + T/(t), 
are closely monitored and measured over time. For easy illustration and 
computational simplicity, we fix the parameters p and S in our numerical 
studies, and our objective is to estimate three constant parameters and one 
time- varying parameter, (A, N, c, n(t)) T based on measurements of viral load 
and total CD4+ T cell count. 

5.1. Monte Carlo simulation study. The following parameter values and 
initial conditions were used to simulate observation data for (2.1): Tjj(0) = 
600, T/(0) = 30, V(0) = 10 5 , A = 36, p = 0.108, N = 1000, 6 = 0.5, c = 3. For 
comparison purpose, we generated the measurement data of V(t) and T(t) 
for four scenarios in our simulation studies: (i) ij(t) = ij is a small constant, 
rj = 9.5 x 10~ 6 ; (ii) rj{t) is time- varying but with a smaller (10%) variation, 
rj(t) = 9 x 10~ 5 x {1 — 0.9cos(7ri/400)}; (iii) r/(t) = rj is a larger constant, 
rj = 3.84 x 10~ 5 ; and (iv) rj(t) is time- varying but with a large (10- fold) 
variation, rj(t) = 9 x 10~ 5 x {1 - 0.9cos(7rt/40)}. Note that for cases (i) and 
(iii), the values of constant 77 were chosen to be approximately the average 
of rj(t) over the period of time interval for cases (ii) and (iv), respectively. 

Let y\ = T = Tjj + Tj denote the total number of infected and uninfected 
CD4+ T cells and j/2 — V denote the viral load, the measurement models 
are given as follows: 

yu = T(ti)+e u , 
V2i = V(ti)+e 2 i, 
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where En and E2% are independent and follow normal distributions with mean 
zero and variances o\ { and respectively. The HIV dynamic model was 
numerically solved within the time range [0, 20] to generate the simulated 
data at each time interval of 0.5 using the 4-stage Runge-Kutta algorithm. 
Consequently, the corresponding sample size is 40. The 20% measurement 
errors were added to the numerical results of the ODE model according to 
the observation equations above. We applied the proposed estimation meth- 
ods in Sections 3 and 4 to the simulated data for the 4 cases to evaluate 
the performance of the proposed estimators and the effect of the model mis- 
specification. To stabilize the computational algorithm, we log-transformed 
the data. We also fixed parameters p and 5 as their true values. 

For evaluating the performance of the estimation methods, we define the 
average relative estimation error (ARE) as 



where 9j is the estimate of the parameter vector 9 from the jth simulation 
data set, and M = 500 is the total number of simulation runs. 

In Table 1, the AREs of the constant parameters (X,N,c) are listed. In 
addition, we also report ctq DE as the average of the estimated variance by 
the observed pseudo-information matrix and cr^ mp as the empirical variance 
based on simulation runs. Based on these results, we can see that, when 
the change of rj(t) is small as a function of time t or rj is a small constant, 
the estimation of parameters is always good by fitting a constant rj model 
as observed by the low ARE values. However, when the change of r/(t) is 
large or 77 is a large constant, misspecification of r/(t) may produce large 
AREs for all parameter estimates. In particular, when r/(t) is time-varying 
with a large variation, using a constant r\ model may result in very poor 
estimates for all constant parameters. The variance estimates based on the 
pseudo-information agree well with the empirical estimates based on sim- 
ulations, which shows that the pseudo-information-based variance estimate 
is reasonably good. The evaluation of the bootstrap variance estimation is 
prohibited in our simulation study due to high computational cost. 

In Figure 1, the average trajectories of estimated r}(t) are compared to 
the true trajectories of rj(t) for four different scenarios. From this figure, 
we observed a similar trend as the constant parameter estimates. The mis- 
specification of rj(t) produces estimation error, in particular for the cases 
with a large variation of rj (t) or a large constant rj. When the model of 
rj(t) is correctly specified, the estimates based on the proposed methods are 
reasonably good. In order to evaluate the robustness of the proposed ap- 
proach, we also performed further simulation studies for a complex function 
r)(t) = 9.0 x 10~ 6 + 9.0 x 10~ 7 xtx{l- 0.5sin(vrt/5.8)} under the same 
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Table 1 

Simulation results for constant rj and the time-varying r/(t) models. The ARE is calculated based on 500 simulation runs for the HIV 
dynamic model. In addition, (Tqde * s ^ e average of the estimated variance by the observed pseudo-information, and o"e mp is the 
empirical variance based on simulations. The sample size is n = 40 and the noise level is about 20% 



Change 

of V (t) 


True T](t) 
model 


Fitted 
r){t) model 




A 






N 






c 




ARE(%) 


"ODE 


2 

" emp 


ARE(%) 


""ode 


2 

" emp 


ARE(%) 


"ode 


2 

"emp 


Small 


Constant 


Constant 


3.19 


2.49 


1.91 


17.7 


3.23e+04 


4.94e+04 


17.4 


0.313 


0.425 






Time- varying 


6.45 


9.82 


8.77 


22.9 


7.14e+04 


8.63e+04 


20.5 


0.593 


0.635 




Time- varying 


Constant 


3.77 


2.38 


2.08 


17.9 


3.36e+04 


4.71e+04 


19.8 


0.331 


0.432 






Time- varying 


6.40 


9.16 


8.55 


22.6 


6.53e+04 


7.93e+04 


20.9 


0.543 


0.637 


Large 


Constant 


Constant 


6.29 


8.19 


12.1 


72.5 


1.13e+06 


8.53e+05 


67.3 


9.22 


6.75 






Time- varying 


7.34 


9.19 


15.6 


88.8 


3.25e+06 


1.13e+06 


82.5 


26.2 


8.96 




Time- varying 


Constant 


94.2 


13.7 


7.02 


994 


5.86e+07 


1.25e+08 


1899 


1660 


3780 






Time- varying 


15.6 


31.5 


48.2 


29.5 


1.67e+05 


1.67e+05 


25.1 


1.81 


1.37 



< 
H 

H 

CO 

H 

I — I 

g 
O 

'A 

7, 

O 
W 
O 

> 



H 
H 

•z 

> 
tr 1 

H 
£> 

i — i 

o 

2! 



O 
O 

H 
tr 1 



20 



H. XUE, H. MIAO AND H. WU 



(a) True rj(t): smaller constant 
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Fig. 1. Simulation results for constant r\ and the time-varying rj(t) models. In each figure, 
the true model of n (solid), the constant n model (dotted) and the time-varying n(t) model 
(dash-dotted) are plotted and compared. 



simulation settings (i.e., 40 time points, 20% error, 500 simulation runs). 
The results suggest that the sieve estimator can still capture the essential 
pattern of the complex r\ (t) reasonably well (plots not shown). 

5.2. Application to AIDS clinical data. To illustrate applicability and 
feasibility of our proposed methods and theories, we also applied the pro- 
posed estimation methods to fit the HIV dynamic model to a clinical data 
set obtained from an HIV-1 infected patient who was treated with an an- 
tiretroviral therapy. Very frequent viral load measurements were collected 
from this patient after initiating the antiretroviral regimen: 13 measurements 
during the first day, 14 measurements from day 2 to week 2, and then one 
measurement at weeks 4, 8, 12, 14, 20, 24, 28, 32, 36, 40, 44, 48, 52, 56, 64, 
74 and 76, respectively. In addition, the measurements of total CD4+ T cell 
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counts were also taken at Day 1, weeks 2 and 4, and monthly thereafter. 
Equation (2.1) was used to estimate HIV kinetic parameters using the viral 
load and total CD4+ T cell data. 

For simplicity of illustration and computation, we fixed the initial con- 
ditions of the state variables in (2.1) as Tu(0) = 1, T/(0) = 551, V(0) = 
6.38 x 10 4 , which were derived from the baseline measurements. We also 
fixed two parameters, as in the simulation study, p = 0.10 and 5 = 0.434, 
which were taken from the estimates in literature. Our objective is to esti- 
mate the three constant parameters (A, N, c) and the time- varying parameter 
r)(t) as in the simulation study. As we proposed in Section 4, we employed 
B-splines to approximate rj(t). We positioned the spline knots at equally- 
spaced time points (the log-time scale was used since the distribution of 
observation time points is highly-skewed). We selected the order of splines 
and the number of spline knots using the model selection criterion AICc 
given by 



where RSS is the residual of the sum of squares obtained from the NLS model 
fitting, n is the total number of observations and k is the number of unknown 
parameters [including the coefficients in the spline representation of rj(t)]. 
Note that as a practical guideline, if the number of unknown parameters 
exceeds n/40 (where n is the sample size) , the AICc instead of AIC should be 
used. For our clinical data, the sample size n is equal to 65, and the number 
of unknown parameters varies between 6 and 13 for different scenarios, which 
is much larger than n/40 = 65/40 = 1.6. Thus the AICc is more appropriate 
for our applications. In general, the AICc converges to the AIC as the sample 
size gets larger, thus the AICc is often suggested to be employed regardless 
of the sample size [Burnham and Anderson (2004)]. For our application, we 
used AICc and compared the models with the splines of order 3 and 4, and 
the number of knots from 3 to 10. In Table 2, the AICc values for these 
different models are reported, from which the best model was selected as 
the spline with order 3 and 5 knots for r)(t) approximation. 

We used the weighted bootstrap method to calculate both the confidence 
intervals for the constant parameters and the confidence bands for the time- 
varying parameter. The basic idea of the weighted bootstrap method is pro- 
vided in Sections 3.2 and 4.2. For the computational implementation, we 
first generated a positive random weight for each data point in the raw data 
set from the exponential distribution with mean one and variance one. By 
repeating this step, a large number of (say, 1000) sets of weights can be gen- 
erated. Second, for each set of weights, the ODE model is fitted to the data 
to obtain parameter estimates by minimizing the weighted residual sum of 
squares (see Sections 3 and 4). Recall that the time- varying parameter in 
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Table 2 

Model selection results for B-spline approximation of the 
time-varying parameter rj(t) 



Model 


Spline 


Number of 


AICc 




order 


knots 




1 




3 


-222.3 


2 




4 


-242.8 


3 




5 


-252.8 


4 




6 


-243.8 


5 


3 


7 


-250.6 


6 




<s 


-246.0 


7 




9 


-246.8 


8 




10 


-244.4 


9 




3 




10 




4 


-233.2 


11 




5 


-230.3 


12 




6 


-242.6 


13 


4 


7 


-249.1 


14 




8 


-245.5 


15 




9 


-244.9 


16 




10 


-240.5 



the model has been approximated by B-splines, then both the constant pa- 
rameters and the constant B-spline coefficients are actually estimated. Once 
the estimates of the B-spline coefficients are obtained, we construct the B- 
splines which approximate the time-varying parameter. Thus, we eventually 
obtain 1000 estimates for each constant parameter and 1000 B-splines for 
each time-varying parameter. Third, for each constant parameter, we select 
the 2.5% and 97.5% quantiles of the 1000 estimates to form the 95% con- 
fidence intervals for this parameter. For the time-varying parameter, at a 
single time point, the 1000 B-splines have 1000 values. We also select the 
2.5% and 97.5% quantiles of the 1000 values at this time point to eventually 
form the 95% pointwise confidence bands for the time-varying parameter. 

Model fitting results are given in Figure 2 and Table 3. From Figures 2(a) 
and (b), we can see that the fitting is reasonably good for both CD4+ T cell 



Table 3 

The constant parameter estimation results 



Parameter 


Estimate 


95% confidence interval 


A 


46.52 


[43.20, 51.04] 


N 


1300.39 


[251.93, 4628.26] 


c 


4.35 


[0.98, 14.83] 
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(a) Total CD4+ T cell versus time (b) Viral load versus time 

x10 4 

8: 

6 



> 4 
2 

200 400 

Time (day) 

(c) Estimated r\(t) with 95% bootstrap quan- 
tile confidence intervals (dashed lines) 

2 x10' 5 | 



1.5 




200 400 

Time (day) 

Fig. 2. Model fitting results with n(t) approximated by B-splines of order 3 and 5 knots. 

counts and viral load data. The estimates of constant parameters (A, N, c) 
are listed in Table 3, and the 95% bootstrap confidence intervals of the 
estimates are also provided. The uninfected cell proliferation rate (A) was 
estimated as 46.52 cells per day, the average number of virions produced by 
one infected cell (N) was estimated as 1300 per day and the clearance rate of 
free virions was 4.35 per day which corresponds to a half-life of 3.8 hours. All 
these estimates are in the ballpark of similar estimates from other methods 
[Perelson et al. (1996, 1997)]. In Figure 2(c), the estimated trajectory of the 
time- varying parameter r](t) (the viral infection rate), is plotted with 95% 
bootstrap quantile confidence intervals, which shows an initial fluctuation 
but converges to a constant after 2 to 3 months. 

6. Discussion. In this paper, we have systematically studied numerical 
solution-based NLS estimators for general nonlinear ODE models which the 
closed-form solutions are not available. Both constant and time-varying pa- 




200 400 
Time (day) 
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rameters are considered. For the model involved time-varying parameters, 
we formulated the estimator under the framework of sieve approach. Our 
main contribution is the establishment of the asymptotic properties for the 
proposed numerical solution-based NLS estimators (including the sieve NLS 
estimator for the time-varying parameter) with consideration of both nu- 
merical error and measurement error. Our results show that if the maximum 
step size of the p-order numerical algorithm goes to zero at a rate faster than 
ji -1 /(p a4 ) ) the numerical error is negligible compared with the measurement 
error. This provides guidance in selecting the step size for numerical evalua- 
tions of ODEs. Moreover, we have shown that the numerical solution-based 
NLS estimator and the sieve NLS estimator for the model with a time- 
varying parameter are strongly consistent. The sieve estimator of constant 
parameters is asymptotically normal with the same asymptotic co-variance 
as that of the case where the true solution is exactly known, while the esti- 
mator of the time- varying parameter has an optimal convergence rate under 
some regularity conditions. We also obtained the theoretical results for the 
case when the step size of the ODE numerical solver does not go to zero fast 
enough or the numerical error is comparable to the measurement error [see 
case (ii) of Theorem 3.2 and Remark 3]. To our best knowledge, this is the 
first time that the sieve method has been extended to the case of ODE mod- 
els which have no closed-form solutions, and the sieve-based theories were 
used to establish the asymptotic results and construct confidence intervals 
(bands) for both constant and time-varying parameters. Note that we only 
considered a single time-varying parameter in the model, but the method- 
ologies can be extended to multiple time-varying parameters although it is 
more tedious to implement. 

Note that the NLS estimators have good properties under some assump- 
tions and are more accurate compared to other estimates such as those 
proposed in Ramsay et al. (2007), Chen and Wu (2008) and Liang and Wu 
(2008). But the price that we have to pay is the high computational cost 
to obtain the NLS estimates. To reduce the computational burden, we may 
use the rough estimates from other methods [Ramsay et al. (2007), Chen 
and Wu (2008), Liang and Wu (2008)] to narrow down the search range 
for the NLS optimization algorithm. More efficient optimization algorithms 
may also be employed to speed up the computation. We are also considering 
to parallel our global optimization algorithms on high-performance comput- 
ers. Hopefully these efforts can help us to handle a reasonable size of ODE 
models. 

This article only considered the initial value problem (IVP), that is, the 
initial conditions are assumed to be given. In practice, the initial conditions 
can be estimated from the data. However, the generalizations of the theoret- 
ical results to the cases of estimated initial conditions and other boundary 
value problems as well as constraints on parameters are not trivial. Also 
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note that, if there is more than one time-varying parameter in the model, 
similar identifiability techniques in Section 2 may be applied to these pa- 
rameters one by one, sequentially. Spline approximation to these multiple 
time-varying parameters can be used for estimation. But the computation 
and theoretical results are more complicated in this case. However, these 
generalizations are worth further investigations in future. 

APPENDIX: PROOFS 

Lemma 1. Under conditions A1-A5, sup ter ||X(t,/3) — X(t, /3)||oo = 
0(h pM ) for any given (3 £ B in (1.1). 

Proof. By Theorem 3.4 in Hairer, N0rsett and Wanner [(1993), page 
160] under conditions A1-A5, for the pth order numerical algorithm (3.2) 
for (1.1), its global discretization error satisfies 

max ||X(si, /3) -X(s i; /3) ||oo = 0(/i p ) for given f3 G B. 

0<i<m—l 

When t is not coincident with the grid points of the numerical algorithm, 
the cubic Hermite interpolation [de Boor (1978), page 51] will be used to 
obtain the solution at time t. In this case, 

sup ||X(t,/3)-X(t,/3)|| 0O = 0(/i 4 ). 

tel\{s t : 0<i<m-l} 

Then it follows that 

sup||X(t,/3)-X(t,/3)|| 0O 
tei 

< sup \\X(t,0)-X(t,(3)\\oo 

t£l\{si : 0<«<m-l} 

+ max ||X(t,/3)-X(i, 0)1100 

tejsi : 0<i<m-l} 

= 0{h 4 ) + 0(h p ). 

In general, h is less than 1, 0(h 4 ) + 0{h p ) = 0(/i pA4 ), which completes the 
proof. □ 

Moreover, Lemma 1 can be extended to the ODE model (1.2) with both 
constant and time- varying parameters, since for this model, it can be verified 
that the result of Theorem 3.1 in Hairer, N0rsett and Wanner [(1993), page 
157] is still valid for any given f3 G B and rj G A under condition B2 (it can be 
derived using the Taylor expansion and the Chain rule), which leads to the 
same conclusion as Theorem 3.4 in Hairer, N0rsett and Wanner [(1993), page 
160]. For Theorems 3.1 and 3.2, the proofs for the univariate and multivariate 
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cases are the same. For presentation and notation simplicity, we only outline 
the proof for the univariate case below. 

Proof of Theorem 3.1. Denote M n (f3) = ^ ZlILit^C^) ~ ^(^,/3)] 2 , 
M n (P) = ±EtilY&)-X(^P)} 2 and M(J3) = [Y(t) - X(t, (3)] 2 . 

First, we claim that Eq[M(/3)\ reaches its unique minimum at (3 = (3 . In 
fact, 

E [M((3)] = E [Y(t)-X(t,f3)] 2 

= E [Y(t) - X(t,f3 ) + X(t,(3 ) - X(t,f3)} 2 
= E [Y(t) - X(t, /3 )] 2 + E t [X(t, (3 ) - X(t, (3)} 2 
= E [e(t)] 2 + E t [X(t,(3 )-X(t,(3)] 2 
>E [e(t)f = E [M(f3 )}, 

where the third equality holds because the intersection term equals zero 
according to the following calculation: 

E [e(t)][X(t,f3 )-X(t,f3)} 

= E t E {[e(t)][X(t,(3 )-X(t,/3)]\t} 
= E t {[X(t,f3 )-X(t,f3)]E [e(t)}} 
= 0, 

because of E [e(t)] = 0. Moreover, E t [X(t,/3) - X(t,f3 )} 2 = if and only if 
(3 = /3q from assumption A6. Thus the above claim holds. Under assumption 
A10, it follows that the first-order derivative a£ °[g (/3)] of E [M(/3)] at (3 
equals to zero and the second-order derivative - ^q^t^ of Eq[M((3)] at (3 
is positive definite. By assumptions A7 and A9, the second-order derivative 
of Eq[M(/3)] in a small neighborhood of (3 is bounded away from and oo. 
Then the second-order Taylor expansion of Eq[M{(3)\ gives that there exists 
a constant < C < oo such that 

^[M^J-M^o^CH^-^oll 2 . 

Thus it is sufficient to prove E [M((3 )] — E [M((3 n )] — > 0, a.s. 

Let Ni(e, Q,J-) be the covering number of the class J- in the probability 
measure Q, as given in Pollard (1984, page 25). From Lemma 4.1 in Pollard 
(1990), we have that Aq(e, L 2 ,B) < (^-) d . Let T n be the set {M n (/3) : (3 € 
£>}. With the Taylor expansion, for any /3 1; /3 2 € B, we can easily obtain 
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where C is some constant. Then for any probability measure Q, we have 

supJVi(e,Q,.F n ) < JVife/C.La.tf) < c(^j for < e < 1. 

Then by Theorem 11.37 in Pollard (1984), su P/3 \M n (f3) - E M(/3)\ -»• 0, 

a.s., under P^. Then we have M n n ) - E o [M0 n )] ->■ and M n (/3 ) - 
£ [M(/3 )]^0, a.s. 
Next, by Lemma 1, 

n 

M n (/3) = -V[Y(^)-^,/3)] 2 
n 

8=1 

n 

= - Et y (^) - x &>® + o(^" A(pA4) )] 2 

n 

= - El y ('i) - ^C.")] 2 + 0(n- x * M) ) 
i=l 

= M n (/3) + 0(n- A ^ A4 )). 

Then 

M re 0J-£ o [M(/3 o )] 

>M n (/3J-£o[M03 n )] 

= M n (f3 n ) + 0(n" A ^ A4 )) - SolM^J] 

and 

M n (£j-£ [M(/3 )] 

<M n (/3 )-£ [Af(/3 )] 
= M n (/3 ) + 0(n- A ^ A4 )) - EqM^o). 
Hence M n n ) - E [M(j3 )} -> 0, a.s. Thus 
|£ [M(4j]-£ [M(/3 )]| 

< |M„0J - So[M0J]| + |M„0 n ) - Po[M(/3 )]| a.s. 

Since /3 is the unique minimum point for £'o[M(/3)], (3 n is almost surely 
consistent with respect to Pp . □ 

Proof of Theorem 3.2. For the proof of part (i), it suffices to verify 
conditions of Theorem 2 in Pollard (1985). Denote G n (/3) = ± Yl?=i[ Y (U) - 
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!&,/3)]^M GM = ±£r=i^(*i) " X(t h l3)}^%^ and G(/3) = 

^[F(t)-X(t,)9)] ^§^. Obviously, G n n ) = and G(/3 ) = E t E ({[Y(t)- 

X(t,f3 )] ^%^}\t) =0 from Eb[T(t)|t] = X(t,/3 ). 

First, we verify the following result: ^[G n (/3 ) - G((3 )\ A iV(0,i?x). For 
fixed t, according to the multivariate inequality of Kolmogorov type for Li- 
norms of derivatives [Babenko, Kofanov and Pichugov (1996), page 9], we 

h n ax{t,p) ax(t,j9) I, < r u d 2 Mm a 2 x(t, i 3) ||i/2||.y. r R) xCfmn 1/2 < 
nave ||— ^ s Ml a/3ai3 T a/39/3 T l|o ° ll^'W _ -M^PJIloo - 

~ 1/2 

C"||X(t,/3) — X(t,/3)||oo for two constants C and C , where the second in- 
equality holds because of the uniform boundedness of both and 

9 q^q0t under conditions A7 and A8. Based on sup te/ ||X(t, (3) — X(i, /3) ||oo = 

0(n" A (P A4 )) from Lemma 1, it follows that ||^^ -^r^ll = 0(n" A (P A4 )/ 2 ). 

Considering that Y{ti) — X(ti,(3 ) and 9X g^ ^ are bounded, we have 

>M[G n (p )-GQ3 )] 



t=i 
1 n 



X(t i ,A>) + 0(n-*<' M >)] 



i=l 



0(n 



-A( P A4)/2n 



1 

v i=i 



XfaPo)] + 0(n -A(pA4)/2+l/2, 



d(3 



When A>l/(pA4), („-a(pa4)/2+i/2^ 
we have 

MG n (/3 )-G<J3 )] 



o(l). So for the above expression, 



= VS[G„(/3 )-G(3„)] + o(l). 
Based on the general central limit theorem, y/n[G n (j3Q) — G(/3 ) 



iV(0,Hi) 



with 



U^EolY^-Xit^o)}' 



dX(t,(3 ) 
dP 



dX(t,(3 ) 
df3 



Second, let 5 n 1 0. For ||/3 — /3 || < <5 n , we want to show that 
MG n ((3) - G{(3)} - MGn([3 ) - G({3 )} = Op(l). 
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In fact, from the first step above, for any /3 G B, we have that ^/n[G n {j3) — 
G n ((3)]=o p (l). Then 

MGn(P) - G(p)\ - MGn((3 ) - G((3 )] 

= MGn(P) - G(0)\ - MGn(Po) ~ G(Po)} + o p (l). 

From Lemma 4.1 in Pollard (1990), we have that N x {e,L 2 ,B) < {?f) d . Let 
A n be the set {G n ((3) : (3 G £>} for any X G X . Using a Taylor series expan- 
sion, for any (3 l: (3 2 £ B, we can easily obtain 

\G n {f3 l )-G n ((3 2 )\<C\\f3 1 -f3 2 l 

where C is some constant. Then for any probability measure Q, we have 

JVi(e, L 2 (Q),A n ) < N^e/C, L 2 ,B) < C 
and thus 

logiVi(e,L 2 (Q),A n )<dlogi. 

Since J 1 log(l/e) < oo, A n is a P-Donsker class by Theorem 2.5.2 in van 
der Vaart and Wellner (1996). Hence y/n[G n ((3) - G{j3)} - y/n[G n (j3 ) - 
G(/3 )]=o p (l). 

Third, with some simple calculations, we have G((3) = Et[X(t, f3 ) — X(t, 
13)] then 

and ^\^ = -E t {°*^±}® 2 . Denote H 2 = E t {°*$M}® 2 . Then by 
using the Taylor series expansion again, the function G((3) is 
Frechet-differentiable at (3 with nonsingular derivative H2. 

Thus all conditions of Theorem 2 in Pollard (1985) are satisfied, then 
Theorem 3.2(1) holds with Vi = H^ 1 H 1 (H^ 1 ) T = a\{E t [^^o)]® 2 }" 1 . 

For the proof of case (ii) of Theorem 3.2, it is easy to verify the con- 
ditions of Theorem 2 in Pollard (1985) for the asymptotic normality. Now 
we just need to show p = (3 + 0(h^l 2 ) and V 1 = V 1 + 0{h^/ 2 ). De- 
note M(/3) = [Y(t) - X(t,P)} 2 and G(J3) = E [Y{t) - X(t,p)] ^J^. Since 

Eq[M((3)\ reaches its minimum at /3 = /3, then the first-order derivative of 
Eq[M(P)] at j3 equals 0, that is, G{P) = 0. Then similar to the proof of case 
(i) above, we have 

G(P) = E [Y(t)-X(t,P)]^M^ 
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= E [Y(t)-X(t,P) + O(hP M )} 

= G(P) + 0(h^/ 2 ). 

It follows that G0) = G0) + 0(h^ M V 2 ), then G0) = 0{h^l 2 ) from 
G(j3) = 0. The Taylor series expansion yields that there exist constants < 
c\ , c 2 < oo such that 

cifP - P \\ < \G0) - G((3 )\ < c 2 -/3 \\. 

Thus \\(3 - O \\ = 0(/i( pA4 )/ 2 ) from G(J3 Q ) = 0. Similarly we can show that 
||Vi-Vi|| = 0(^ A4 )/ 2 ). □ 

Some definitions and notation are necessary in order to prove Theorems 
4.1-4.3. Denote M n (0) = ± £? =1 EjLiKfo) " XjfaPMU))] 2 , M n {6) = 

^ELiEf^K^-x^t^A^^andMW^Ef^K-XiCt^,^))]^ 

We define a semidistance p on as 

p 2 (0, G ) = E {(P - (3 ) T Mi(e) + M 2 (0)[ V - Vo ]} 2 , 

where M\ is the score function of M for /3, and M 2 is the score operator 
of M for rj, both evaluated at the true parameter value 9q. Similarly to 
the proof in Huang and Rossini [(1997), page 966] when V 2 (6q), defined 
in assumption B6, is positive definite, and Mi and M 2 are bounded away 
from +00 and —00, if p(O n ,0o) = O p (r n ), then d(# n ,#o) = O p (r n ); and if 
p(6 n ,0o) — > almost surely under Pq , then d(0 n ,6o) — > almost surely 
under Pg . 

Proof of Theorem 4.1. Similarly to the proof of Theorem 3.1, we 
have that Eq[M(0q)] reaches its unique minimum at 6 = 9q. It follows that 

E o [M(0 o ) - M{0 n )] > c P 2 (e n ,e ), 

where C is some constant. Thus if Eq[M(0q)] — Eo[M(0 n )] — > 0, almost 
surely under Pg , then d(6 n ,0o) — > 0, almost surely under Pg . 

Let A s n be the set {rj G An, \\v — %o||2 < <5} and N 2 {s : L 00 ,A^) be its 
bracketing number with respect to [see Definition 2.1.6, van der Vaart 
and Wellner (1996)], where rj n o is the map point of 770 in the sieve A n . By 
the calculation of Shen and Wong [(1994), page 597] for any e < 5, we have 

N 2 (e,L 00 ,^ l )<C(6/e) N , 

where N = q + 1 is the number of B-splines basis functions. Let J- n be the 
set {M n (0):||/3-/3 o || <5,i]£ A n , \\rj - r] n0 \\ 2 < 5}. For any U 2 £ n , we 
can easily obtain 

\M n {0 x ) - M n {0 2 )\ < CGI/3! - p 2 \\ + \\ m - ^lU) 



dX(t,P) 
dp 



+ 0(h^l 2 ) 
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using Taylor's expansion. Hence 

N 2 (e,L 00i JF n ) < JVi(e/2,L 2 ,jB) x N 2 (e/2, LooX) 

< C(m d /e) d {5/e) N 

< C'{l/e) N+d . 

Note that, since N 2 (e, L^,^) depends on n in the above expression, we 
cannot directly use Theorem 11.37 in Pollard (1984) to obtain supj- n \M n (0) — 
Eq[M(6)] \ — > 0, a.s., under Pg . Fortunately, we can still get this result based 
on (A.2) in Xue, Lam and Li (2004). Thus we have M n (9 n ) - E [M(9 n )] -> 
and M n (6 n o) — Eq\M{Q„q)] — > 0, a.s., where 9 n Q is the map point of 6$ in 
the sieve 0„. 

From the extension of Lemma 1 for any given j3 S B and rj(t) G A in (1.2), 
similarly to (A.l), we have 

(A.2) M n (9) = M n (9) + 0(n~ x ^). 

Then the remaining steps are similar to those in the proof of Theorem 3.1. 
□ 



Proof of Theorem 4.2. We apply Theorem 3.4.1 in van der Vaart 
and Wellner (1996) to obtain the rate of convergence. 

For 6 n o in the proof of Theorem 4.1, define 8 n o h-> pi(0,9 n o) be a map 
from 6 n to [0,oo) as />?(#, nO ) = E O [M(0)} - E [M(9 n0 )}. Choose S n = 
p(9 ,9 nQ ). For 5 n < 5 < oo, denote Q = {9:9 £ @ n ,S/2 < p(9,9 n0 ) < 5}. 

From the definition of pi, we have supQ Eo[M(Q n o)] — Eq[M(9)] < 

Let E n be the set {M n (9) - M(8 n0 ) : 9 £ 8 n } and J(5, L 2 (P),E n ) be the 
L2(.P)-norm bracketing integral of the sieve G n . From the proof of Theorem 
4.1, we have 

J(8,L 2 (P),E n )= / y /l + \ogN 2 (e,L 2 (P),E n )de 
Jo 

r s 

< / y/l + logN 2 (e,L 00 ,S n )de 
Jo 

< CN 1/2 5. 

Let 

MS) = AS, L 2 (P),E n ) (l + J^P),E n ) \ = Nl/H + N 

Obviously, (f) n (5)/5 1+T is a decreasing function in 5 for < r < 1. Then by 
Lemma 3.4.2 in van der Vaart and Wellner (1996), we have 



En 



supVn(M n -M)(0-0 nO ) ±M$)- 
■ n J 
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For A > l/[2(p A 4)], from (A.2), it follows that 

MM n (0) ~ M n (0)} = 0(^/2^4) ) = o(1) 

Then we have that 

E [sup^(M„ - M)(0 - e n0 )] 1 <f>n(6). 
1 n J 

Then the conditions of Theorem 3.4.1 in van der Vaart and Wellner (1996) 
are satisfied for the 5 n , pi and 4> n (5) above. Therefore we have -r 2 pi(6 n , 6 n0 ) 
O p (l), where r n satisfies r^(j) n {^-) < y/n. It follows that r n = N~ l / 2 n 1 / 2 = 
n (i-v)/2 m Thus pi (0 n ,0 nO ) = O p (n-( 1 ^)/ 2 ). 
Now, we define a distance p2 as 

0i(0i,02) = WPi - 02 II + Wm - rnWoo- 

Let ? be a positive constant. Similarly to the proof of Theorem 3.2 in Huang 
(1999), it is easy to follow that for any with p2(6,Qno) < there exist 
constants < c\ , c 2 < oo such that 

- Cl d 2 (0,e n0 ) + O p (n~ 2 ^) < -pl(0,e nO ) < -c 2 d 2 (0,O n0 ) + O p (n~ 2v e). 

Therefore, for a constant C2 > 0, 

2 (b„ a^\< r>..(n- 2v e ± n -{ l --»)\ 



c 2 d (0 n ,6 n0 ) < O p (n 



Because d(6 n0 ,0 ) < p 2 (0o,0o) = O p (n~ l ' s ), we have d(0 n ,0 ) = Op(n~ vs + 

n -(l~v)/2y n 

Proof of Theorem 4.3. We prove this theorem using Theorem 6.1 
in Wellner and Zhang (2007). It suffices to validate conditions A1-A6 of 
Theorem 6.1 in Wellner and Zhang (2007). From the proof of Theorems 4.1 
and 4.2 above, it is easy to see that condition Al regarding consistency and 
rate of convergence and condition A2 for Theorem 6.1 in Wellner and Zhang 
(2007) hold. 

For condition A3, we need to calculate the pseudo-information matrix. 
For any fixed rj G A, let .Ao = {Vu(') :UJ m a neighborhood of G 1Z} be a 
smooth curve in A running through t]q at U) = 0, that is, 77^=0 (t) = %(*)■ 
Denote ^;??aj(i)U=o = o,(t) and the space generated by such a(t) as T. The 
score functions of f3 and r\ are 

M 2 [a] = ^ = -2f:(Y j -X j )^ia(t). 
drjo ~i dr >o 
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We also set 



M 



11 



d 2 M 



K 



M 12 [a]=M 2 \[a) 



d 2 M 
d(3 drj 



d/3 d/3 



o 



d(3 d(3 



K 



2 £ 



«9/3 5r?o 1 J 3) d(3 d Vo 



o(t) 



and 



M 22 [oi,a 2 ] 



a 2 M 



2 £ 



* r 'dX^ 2 



9% 



ai(i)a 2 (i), 



where ai(t),a 2 (t) G T. Following the idea from the proofs of the asymp- 
totic results for semiparametric M-estimator in Ma and Kosorok (2005) 
and Wellner and Zhang (2007), we assume that the special perturbation 
direction a*(i) = (a*(t), . . . , a^(t)) T with a*(t) 6 T for 1 < i < d, satisfies 
Eq{Mi2[o\ — A-/22 [ a * 7 a ] } = for any a 6 T. Some calculations yield 
E {Mi 2 [a] - M22 [a* , a] } 



\d/3 5% 



X, 



ax, 

drjo 



3/3* drj 
d 2 X 







K 

2 ^2 EfEo 
3=1 



dx j ox j (y ._ x ., d 2 Xj 



df3 di] 



d/3* drft, 
d 2 X, 



a(t) 
a(t)a*(t) 
a(t) 



a(t)a*(t) 



It follows that 



Z^EMdXj/dPodXj/drio - (Yj - XjgX^/ggp d m }\t] 
Ef =1 S [{(aX J /ar ?0 ) 2 - {Y j - X^Xj/dr) 2 }^] 



(A.3) a*(t) 



Since £Jo[Y(i)|i] = X(i), a*(i) in (A.3) can be simplified as 
(A.4) a *(t) = ^ J - 1 J/ J/ ' 



Zf=i(dXj/dVo) 2 



For K > 2, both 



A* 



(A.5) S 1 = J B (A ; /ii-M 12 [a*]) = 2^^ 



ax, ax, axj 



a'(t) 
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and 

(A.6) S 2 = Eq{M\ - M 2 [**]r = 4E^{^ - ^a*(*)} 

are nonsingular. Let V2 = Sj~ 1 S2(Sj~ 1 ) T . Thus condition A3 of finite variance 
for Theorem 6.1 in Wellner and Zhang (2007) is satisfied. 

Conditions A4 and A5 for Theorem 6.1 in Wellner and Zhang (2007) 
can be verified by similar arguments as condition (i) and C3 in the proof 
of Theorem 4 in Xue, Lam and Li (2004), respectively. Condition A6 of 
smoothness of the model can be easily verified using a straightforward Taylor 
expansion where n~ Cl is just the rate of convergence in Theorem 4.2 and 
faster than ra" 1 / 4 , and c 2 = 2, which completes the proof. □ 

Proof of Proposition 1. Let Q be the set of a real valued functions 
g on [a, b] which are absolutely continuous and satisfy f g 2 {t)dt < 00 and 
Etg{t) = 0. Similarly to the proof of Theorem 4.3, for any fixed rj € A, let 
Aq = {rjuj(-) :wina neighborhood of G 1Z} be a smooth curve in A running 
through rjo at uj = 0, that is, 77^=0 (i) = Vo{t)- Denote ^^(t)^^ = a(t) and 
restrict a(t) G Q. Denote the space generated by such a(t) as T. The score 
functions of (5 and r\ are 

*6M = ^ = -2(y-X)^a(t) 

with £ = A) + %(£)• Let P be the linear span of M2[a]. Since Eq{MiM2[o\} = 
for any a(t) G T, it follows that M\ is orthogonal to P. Thus the efficient 
score function of /3 is just M\. Then the pseudo- information is i?o[-^i]- The 
rest of the proof is similar to that of Theorem 4.3, where the efficient score 
function and the pseudo-information are updated as discussed before, and 
the least favorable direction can be selected by any a G T. □ 
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